require(ASMap)
require(RColorBrewer)
require(dplyr)
require(ggplot2)
require(reshape2)
require(ggparallel)
require(wgaim)
setwd("/home/exserta/Documents/master_project_noelle/projects/BioNano-exserta/GBS_LepMAP3")
dopdf = TRUE # to create pdf document
Heterozygous sites have been replaced by NA. The number of heterozygous sites was 37564, the number of axillaris 143739 and the number or exserta 179837. The heterozygosity therefore was 0.1040151.
Genotype frequencies among individuals : number or markers per individual which are of a given genotype.
indvs <- read.table("F7-K/LM3_F7_K.INDIVIDUAL.genoSUMMARY.csv", header=T, sep=",")
i_nmar <- unique(indvs$tot)
print(as.character(indvs[which(indvs$NA. > 1000), "individual"])) # are removed in cleaned marker set
## [1] "RIL_35" "RIL_44" "RIL_67" "RIL_88"
ggplot(data = indvs, aes(AX, EX)) + geom_point(alpha = 0.2)
if(dopdf == T){ggsave("figures/filtering1.pdf")}
## Saving 7 x 5 in image
indvs <- melt(indvs)
## Using individual as id variables
indvs <- indvs %>% filter(variable != "tot")
#check for exessive missings
indvs[which(indvs$NA. > 1000),] # 0, nothing to remove
ggplot(indvs, aes(factor(variable), value / i_nmar)) +
geom_jitter(height = 0, width = 0.45, size = 0.2, colour = "grey50") + geom_violin(aes(fill = variable), draw_quantiles = 0.5) + labs(x = "Genotypes", y="Frequency", title = "Genotype frequencies among individuals", subtitle = paste("Median : AX = ", median(indvs[indvs$variable == "AX", "value"] / i_nmar), " EX = ", median(indvs[indvs$variable == "EX", "value"] / i_nmar), "Missing = ", median(indvs[indvs$variable == "NA.", "value"] / i_nmar)))
if(dopdf == T){ggsave("figures/filtering2.pdf")}
## Saving 7 x 5 in image
Genotype frequencies among markers : number of individuals per marker which are of a given genotype.
mrks <- read.table("F7-K/LM3_F7_K.MARKER.genoSUMMARY.csv", header=T, sep=",")
mrks <- melt(mrks)
## Using marker as id variables
mrks <- mrks %>% filter(variable != "tot")
#check for exessive missings
mrks[which(mrks$NA. > 100),] # 0, nothing to remove
ggplot(data = mrks, aes(value, fill = variable)) +
geom_density(alpha = 0.2)
if(dopdf == T){ggsave("figures/filtering3.pdf")}
## Saving 7 x 5 in image
ggplot(mrks, aes(factor(variable), value/195)) +
geom_jitter(height = 0, width = 0.45, size = 0.1, colour = "grey50") + geom_violin(aes(fill = variable), draw_quantiles = 0.5) +
labs(x = "Genotypes", y="Frequency", title = "Genotype frequencies among markers", subtitle = paste("Median AX = ", median(mrks[mrks$variable == "AX", "value"] / 195), "EX = ", median(mrks[mrks$variable == "EX", "value"] / 195), "missing = ", median(mrks[mrks$variable == "NA.", "value"] / 195)))
if(dopdf == T){ggsave("figures/filtering4.pdf")}
## Saving 7 x 5 in image
geno <- read.table("F7-K/F7-K.geno.csv", header=T, sep=",")
nhet <- sum(as.vector(apply(geno == "HET", 2, sum)), na.rm=T) / (sum(as.vector(apply(geno == "AX", 2, sum)), na.rm=T) + sum(as.vector(apply(geno == "EX", 2, sum)), na.rm=T))
Heterozygosity [%] : 0.1160902
In the input file, axillaris genotype is encoded by “AX”, heterozygous as “HET” and exserta as “EX”. Missing data is encoded as “-”.
| input file | read.cross | f7_K |
|---|---|---|
| AX | AA | 1 |
| HET | AB | |
| EX | BB | 2 |
| - | NA |
f7_K <- read.cross(format = "csv", file = "F7-K/LM3_F7_K.markers.clean.csv", F.gen = 7, genotypes = c("AX", "HET", "EX"), na.strings = "-")#, crosstype = "riself")
## --Read the following data:
## 191 individuals
## 1852 markers
## 1 phenotypes
## Warning in summary.cross(cross): Strange genotype pattern.
## Warning in max(maxsp, na.rm = TRUE): no non-missing arguments to max;
## returning -Inf
## --Cross type: bcsft
f7_K <- convert2riself(f7_K)
The population needs to be converted to riself (a selfing RIL population after many generations). This assumes heterozygosity to be 0. The heterozygotes were manually removed.
Pull out markers which are co.located. Why not also remove markers which show linkage disequilibrium? We expect and see LD. However, we don’t want to lose too many markers. Therefore we let it be.
f7_K <- pullCross(f7_K, type = "co.located")
Q : Why does this not reduce genetic distance? If the distance is inflated due to small errors within the chromosomes, ordering should reduce the amount of crossovers found between markers (at least a little bit). - It is not expected in any case, but can happen. So no reduction of genetic distance is not a sign for a low quality map, but normal.
mvest.bc is required, if not, the clustering into linkage groups is not performed as well. mvest.bc imputes missing markers before clustering into linkage groups.
#set bychr = FALSE to allow complete reconstruction of map
map1 <- mstmap.cross(f7_K, bychr = F, dist.fun = "kosambi", trace = TRUE, detectBadData = F, p.value = 1e-09, mvest.bc = T, return.imputed = T)
# order markers within linkage groups
map1 <- mstmap.cross(map1, bychr = T, dist.fun = "kosambi", trace = TRUE, detectBadData = F, p.value = 1e-09, mvest.bc = F, return.imputed = T)
summary(map1)
## Warning in summary.cross(map1): Some markers at the same position on chr L.
## 1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## RI strains via selfing
##
## No. individuals: 191
##
## No. phenotypes: 1
## Percent phenotyped: 100
##
## No. chromosomes: 12
## Autosomes: L.1 L.10 L.11 L.12 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9
##
## Total markers: 744
## No. markers: 182 3 1 1 82 93 79 144 75 70 4 10
## Percent genotyped: 89.8
## Genotypes (%): AA:46.4 BB:53.6
before pushing back in markers.
The expected recombination rate is 1 per generation and chromosome. Plotting profileGen per chromosome therefore requires xo.lambda = 7. If it is plotted for all linkage groups, xo.lambda should be set to 49.
heatMap(map1, lmax=15)
## Warning in heatMap(map1, lmax = 15): Running est.rf.
for(i in paste("L.", seq(1,7), sep="")){
profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 7, chr=i)
}
profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
mean(countXO(map1))
## [1] 10.46073
heatMap: see clear linkage groups.
profileGen (per chromosome) : * no pink circles in the “per-chromosome-plots”. * Crossovers : expected amount of crossovers per chromosome ~ 7 (because 7 generations of RIL). Less is okay, because heterozygosity was removed artificially. Per heterozygous marker, a potential double-crossover was removed.
profileMark: * Seg Distortion : don’t use it. But a higher value shows regions with high segregation distortion, where linkage is not as expected, e.g. we would expect such behaviour at the speciation island. * Double Crossovers : 6 is very high, the amount of DCOs is expected to be lower after imputation. Around 2-3 would be OK.
“You can then use cross2int() in this package to perform a smart imputation on your linkage map. It does two things, it condenses the co-located markers into unique markers and this imputes most missing alleles. Then remaining missing values are then imputed using a probabilistic numerical flanking marker algorithm.” cross2int converts cross data to “interval” data and imputes missing markers.
# push back in markers
map1 <- pushCross(map1, type="co.located")
# order again
map1 <- mstmap(map1, bychr = T, dist.fun = "kosambi", trace = TRUE, detectBadData = F, p.value = 1e-09, mvest.bc = F, return.imputed = T)
# saveRDS(map1, "map1.rds")
Some markers are now on top of each other. We could use jittermap to move them away from each other, however that would introduce a manual distance of markers. todo do that?
if(dopdf == T){
pdf("figures/QC_map_before_imp.pdf", onefile=T, paper="a4r", width = 11)
heatMap(map1, lmax=15)
for(i in paste("L.", seq(1,7), sep="")){
profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 25, chr=i)
}
profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
dev.off()
}
## Warning in heatMap(map1, lmax = 15): Running est.rf.
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## png
## 2
Impute with wgaim
map1 <- readRDS("map1.rds")
# impute
map1 <- cross2int(map1, id="Genotype", rem.mark=F) # rem.mark = F to not take out colocated markers from the map
## Warning in miss.q(el$theta, el$imputed.data): Line RIL_50 has missing
## values across the whole of a chromosome, .. These have been replaced by
## 0's.
# geno contains the full genetic map
map1$geno$L.1$map[1:10]
## Peex113Ctg17958_624309 Peex113Ctg17958_624312 Peex113Ctg17958_866375
## 0.0000000 0.9328071 0.9328071
## Peex113Ctg17958_866386 Peex113Ctg17961_1433683 Peex113Ctg17963_181442
## 0.9328071 1.9957285 3.0786914
## Peex113Ctg17963_238591 Peex113Ctg17963_238690 Peex113Ctg17966_963040
## 3.0786914 3.0786914 4.5777366
## Peex113Ctg18098_811695
## 8.4575431
# imputed contains unique markers
map1$imputed.geno$L.1$map[1:4]
## Peex113Ctg17958_624309 Peex113Ctg17958_624312 Peex113Ctg17961_1433683
## 0.0000000 0.9328071 1.9957285
## Peex113Ctg17963_181442
## 3.0786914
saveRDS(map1, "map1_imputed.rds")
write.cross(map1, format="csv") # saves map as "data.csv"
Plotting the map
plot.map(map1, main = "Genetic map, imputed")
# link.map(map1, chr="L.3")
# geno.image(map1, main = "Genetic map, imputed",alternate.chrid=T)
# knitr::kable(head(geno.table(map1)))
Keep in mind that Linkage group 8 (L.8) is (?) consisting of the markers which could not be associated to any other group. Possibly it contains markers from the chloroplast or mitochondrium DNA.
geno.table returns a table of the genetic map:
p value is from chi-square tests for mendelian segregation. Are the observed genotypes compatible with the expected ones? Formula : sum of \(\frac{(O - E)^2}{E}\) for all observation classes.
Markers with a high p-value are expected to be distorted. (correct?) todo : clarify
summaryMap(map1)
Lower triangle : pairwise LOD scores, higher triangle : pairwise estimated RFs. Heat of lower triangle should match heat of upper triangle. Markers within linkage groups are consistent linkage. The linkage within groups is much higher than linkage between groups. A clear clustering was possible.
Good heat map shows that construction process was successful. No detail problems are shown.
heatMap(map1, lmax=15)
## Warning in heatMap(map1, lmax = 15): Running est.rf.
The recombination rate should be appropriate, this is one of the key quality characteristics.
Barley : each individual of population has a expected recombination rate of ~ 14 on a 200cM chromosome.
If there are Genotypes that exceed this expected recombination rate, they are shown in the graph below.
Calculation of xo.lambda, the expected recombination rate todo : go on here!
for(i in paste("L.", seq(1,7), sep="")){
profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 7, chr=i)
}
profile individual marker and interval statistics.
profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## Warning in summary.cross(cross): Invalid genotypes.
## Observed genotypes: 0 1 2
# with ABHgenotypeR, it was 13
mean(countXO(map1))
## [1] 10.35602
seg.dist : profile of the -log10 p-value. Is the result of a test of segregation distortion for each marker dxo : profile number of double crossovers occurring at each marker erf : Profile of recombination fractions for intervals lod : Profile of the LOD score.
if(dopdf == T){
pdf("figures/QC_map_imputed.pdf", onefile=T, paper="a4r", width = 11)
heatMap(map1, lmax=15)
for(i in paste("L.", seq(1,7), sep="")){
profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 25, chr=i)
}
profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
plot.map(map1, main = "Genetic map, imputed")
dev.off()
}
## Warning in heatMap(map1, lmax = 15): Running est.rf.
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## Warning in summary.cross(cross): Invalid genotypes.
## Observed genotypes: 0 1 2
## png
## 2
Print map to see contig names and distances of markers to each other.
map1$geno$L.1$map
## Peex113Ctg17958_624309 Peex113Ctg17958_624312 Peex113Ctg17958_866375
## 0.0000000 0.9328071 0.9328071
## Peex113Ctg17958_866386 Peex113Ctg17961_1433683 Peex113Ctg17963_181442
## 0.9328071 1.9957285 3.0786914
## Peex113Ctg17963_238591 Peex113Ctg17963_238690 Peex113Ctg17966_963040
## 3.0786914 3.0786914 4.5777366
## Peex113Ctg18098_811695 Peex113Ctg18098_811726 Peex113Ctg18118_24362
## 8.4575431 15.9865907 17.0188319
## Peex113Ctg18109_468871 Peex113Ctg18108_1243493 Peex113Ctg18108_1241761
## 17.0188319 17.8404601 18.6596011
## Peex113Ctg18118_817351 Peex113Ctg18118_817355 Peex113Ctg18121_15280
## 20.0165236 20.0165236 21.5379445
## Peex113Ctg18121_15283 Peex113Ctg18118_822185 Peex113Ctg18121_15295
## 21.5379445 21.5379445 24.8457017
## Peex113Ctg18121_15297 Peex113Ctg18121_15303 Peex113Ctg18121_15363
## 27.7756803 28.5117360 30.5587146
## Peex113Ctg18121_15394 Peex113Ctg18121_15420 Peex113Ctg18121_64757
## 31.7098985 31.7098985 36.2782543
## Peex113Ctg08628_39755 Peex113Ctg08984_47666 Peex113Ctg08984_66709
## 41.3692243 41.3692243 42.7843368
## Peex113Ctg00860_16910 Peex113Ctg00860_16930 Peex113Ctg00860_116302
## 43.9568844 43.9568844 43.9568844
## Peex113Ctg17695_690561 Peex113Ctg17696_353203 Peex113Ctg17699_257949
## 48.4521765 49.1914107 51.0831375
## Peex113Ctg17903_1178727 Peex113Ctg17903_1178732 Peex113Ctg18472_1119111
## 53.0107604 53.0107604 54.5418393
## Peex113Ctg18472_1119132 Peex113Ctg18472_1119119 Peex113Ctg18472_1119147
## 54.5418393 54.5418393 55.2652219
## Peex113Ctg18472_1369036 Peex113Ctg17674_1729764 Peex113Ctg17674_1259959
## 55.9732444 59.6882672 59.6882672
## Peex113Ctg17674_1259956 Peex113Ctg17674_1259958 Peex113Ctg18651_10760
## 59.6882672 59.6882672 60.3323127
## Peex113Ctg00004_119009 Peex113Ctg18639_25620 Peex113Ctg18664_402032
## 60.3323127 61.5638813 63.7177793
## Peex113Ctg18664_402051 Peex113Ctg18665_125436 Peex113Ctg18666_49208
## 63.7177793 64.4969610 64.4969610
## Peex113Ctg18666_49221 Peex113Ctg18666_49245 Peex113Ctg07912_36464
## 64.4969610 64.4969610 65.2798569
## Peex113Ctg17727_241766 Peex113Ctg17727_241946 Peex113Ctg17727_241978
## 66.0638840 66.0638840 66.0638840
## Peex113Ctg17727_241996 Peex113Ctg17727_242028 Peex113Ctg17727_242072
## 66.0638840 66.0638840 66.0638840
## Peex113Ctg17727_288558 Peex113Ctg17727_288561 Peex113Ctg17727_288627
## 66.0638840 66.0638840 66.0638840
## Peex113Ctg17727_297659 Peex113Ctg17729_212450 Peex113Ctg17729_212750
## 66.0638840 66.0638840 66.0638840
## Peex113Ctg17729_226721 Peex113Ctg17727_241810 Peex113Ctg17727_241932
## 66.0638840 66.0638840 66.0638840
## Peex113Ctg17727_241944 Peex113Ctg17715_249680 Peex113Ctg17714_834831
## 66.0638840 66.8527750 66.8527750
## Peex113Ctg17715_287486 Peex113Ctg17715_472331 Peex113Ctg17714_769330
## 66.8527750 66.8527750 68.1678862
## Peex113Ctg17712_994913 Peex113Ctg17714_703125 Peex113Ctg17714_757638
## 68.1678862 68.1678862 68.1678862
## Peex113Ctg17714_769423 Peex113Ctg17712_994839 Peex113Ctg17712_994886
## 68.1678862 70.2337100 70.2337100
## Peex113Ctg17729_235620 Peex113Ctg17729_235638 Peex113Ctg17609_324476
## 71.0806389 71.0806389 72.4665555
## Peex113Ctg17609_223199 Peex113Ctg17609_223193 Peex113Ctg17609_86903
## 73.9869299 73.9869299 75.3995740
## Peex113Ctg17609_223163 Peex113Ctg17609_324398 Peex113Ctg17609_223172
## 75.3995740 75.3995740 75.3995740
## Peex113Ctg17609_324401 Peex113Ctg18238_909909 Peex113Ctg18237_331163
## 75.3995740 76.6100367 76.6100367
## Peex113Ctg18237_313478 Peex113Ctg17550_127275 Peex113Ctg18233_449126
## 76.6100367 79.9398654 79.9398654
## Peex113Ctg17550_131027 Peex113Ctg18233_542515 Peex113Ctg18233_542564
## 79.9398654 79.9398654 79.9398654
## Peex113Ctg18237_252537 Peex113Ctg18237_252567 Peex113Ctg17550_130992
## 79.9398654 79.9398654 79.9398654
## Peex113Ctg17550_131042 Peex113Ctg17550_131122 Peex113Ctg17550_131125
## 79.9398654 79.9398654 79.9398654
## Peex113Ctg17550_149925 Peex113Ctg17550_149933 Peex113Ctg18700_129727
## 79.9398654 79.9398654 81.3365510
## Peex113Ctg00050_84557 Peex113Ctg10388_80893 Peex113Ctg18684_432949
## 81.3365510 81.3365510 81.3365510
## Peex113Ctg18686_175956 Peex113Ctg18694_197814 Peex113Ctg18694_520285
## 81.3365510 81.3365510 81.3365510
## Peex113Ctg18700_129738 Peex113Ctg00050_84548 Peex113Ctg17550_152697
## 81.3365510 81.3365510 81.3365510
## Peex113Ctg17550_152698 Peex113Ctg00043_151704 Peex113Ctg11741_208907
## 81.3365510 82.3010152 84.1142089
## Peex113Ctg11685_238514 Peex113Ctg18054_87464 Peex113Ctg17841_497544
## 86.5734234 87.7784443 89.4481909
## Peex113Ctg10531_23725 Peex113Ctg18472_716349 Peex113Ctg17769_321336
## 91.6782688 94.5740174 97.6705579
## Peex113Ctg17774_16638 Peex113Ctg17760_706973 Peex113Ctg17763_269681
## 97.6705579 97.6705579 97.6705579
## Peex113Ctg17766_583688 Peex113Ctg17774_16848 Peex113Ctg17769_321296
## 97.6705579 97.6705579 99.7237082
## Peex113Ctg11453_37447 Peex113Ctg01106_37059 Peex113Ctg17769_321179
## 101.9720907 101.9720907 101.9720907
## Peex113Ctg17768_36371 Peex113Ctg11453_37414 Peex113Ctg17776_34542
## 101.9720907 104.1822244 104.1822244
## Peex113Ctg17757_486006 Peex113Ctg05921_557310 Peex113Ctg17757_486009
## 104.1822244 104.1822244 104.1822244
## Peex113Ctg17774_16661 Peex113Ctg17775_574112 Peex113Ctg18492_1154177
## 104.1822244 104.1822244 104.1822244
## Peex113Ctg18652_191625 Peex113Ctg18652_191646 Peex113Ctg00757_62711
## 104.1822244 104.1822244 106.0525818
## Peex113Ctg00790_20622 Peex113Ctg00790_21453 Peex113Ctg18368_14411
## 106.0525818 106.0525818 107.8189561
## Peex113Ctg18369_691023 Peex113Ctg18378_238368 Peex113Ctg18378_238447
## 107.8189561 107.8189561 107.8189561
## Peex113Ctg18378_238430 Peex113Ctg18378_238432 Peex113Ctg18369_691395
## 107.8189561 107.8189561 107.8189561
## Peex113Ctg18364_770226 Peex113Ctg18363_61632 Peex113Ctg18363_61640
## 107.8189561 109.7641784 109.7641784
## Peex113Ctg18364_770220 Peex113Ctg01630_256927 Peex113Ctg16000_353851
## 109.7641784 112.0558822 112.0558822
## Peex113Ctg01616_47771 Peex113Ctg01641_44685 Peex113Ctg01641_44726
## 112.0558822 112.0558822 112.0558822
## Peex113Ctg17792_168019 Peex113Ctg10921_452655 Peex113Ctg17792_517927
## 113.5612760 113.5612760 113.5612760
## Peex113Ctg17792_517951 Peex113Ctg17976_133197 Peex113Ctg17976_133201
## 113.5612760 115.0606838 115.0606838
## Peex113Ctg17976_133204 Peex113Ctg17976_133208 Peex113Ctg17976_394839
## 115.0606838 115.0606838 115.0606838
## Peex113Ctg17976_394945 Peex113Ctg18564_75305 Peex113Ctg18564_75325
## 115.0606838 115.0606838 115.0606838
## Peex113Ctg18564_101517 Peex113Ctg18564_101524 Peex113Ctg18569_280080
## 115.0606838 115.0606838 115.0606838
## Peex113Ctg18575_20244 Peex113Ctg18560_448542 Peex113Ctg18560_448535
## 115.0606838 115.0606838 115.0606838
## Peex113Ctg17898_879722 Peex113Ctg16000_75976 Peex113Ctg17976_394902
## 115.0606838 115.0606838 115.0606838
## Peex113Ctg17976_394915 Peex113Ctg17976_470726 Peex113Ctg17977_595651
## 115.0606838 115.0606838 115.0606838
## Peex113Ctg18564_75351 Peex113Ctg11805_116254 Peex113Ctg18564_75228
## 115.0606838 116.6490725 116.6490725
## Peex113Ctg13981_129677 Peex113Ctg01361_24163 Peex113Ctg01386_32712
## 118.2568005 118.2568005 118.2568005
## Peex113Ctg14844_219591 Peex113Ctg14844_219578 Peex113Ctg14844_219724
## 118.2568005 118.2568005 118.2568005
## Peex113Ctg18551_162160 Peex113Ctg18551_162184 Peex113Ctg18554_204404
## 118.2568005 118.2568005 118.2568005
## Peex113Ctg00675_49295 Peex113Ctg00133_44649 Peex113Ctg17878_842192
## 119.9594869 119.9594869 121.5846463
## Peex113Ctg17878_842332 Peex113Ctg18684_315511 Peex113Ctg14844_397310
## 121.5846463 121.5846463 121.5846463
## Peex113Ctg17995_401824 Peex113Ctg17878_842145 Peex113Ctg07096_63136
## 122.8473697 122.8473697 122.8473697
## Peex113Ctg07222_44677 Peex113Ctg14844_72922 Peex113Ctg14665_15250
## 122.8473697 124.8971498 124.8971498
## Peex113Ctg14844_219569 Peex113Ctg18039_701514 Peex113Ctg18039_701496
## 124.8971498 124.8971498 124.8971498
## Peex113Ctg18039_885833 Peex113Ctg18032_165069 Peex113Ctg18033_188220
## 124.8971498 126.1531670 126.1531670
## Peex113Ctg18033_188227 Peex113Ctg18039_701446 Peex113Ctg09060_154413
## 126.1531670 126.1531670 126.1531670
## Peex113Ctg18039_997060 Peex113Ctg09060_154345 Peex113Ctg14665_15228
## 126.1531670 126.1531670 126.1531670
## Peex113Ctg14569_155929 Peex113Ctg14603_71136 Peex113Ctg14665_15184
## 126.1531670 126.1531670 126.1531670
## Peex113Ctg18048_486597 Peex113Ctg18039_701325 Peex113Ctg18048_486567
## 127.9686933 127.9686933 127.9686933
## Peex113Ctg18048_486591 Peex113Ctg18048_486592 Peex113Ctg18048_486596
## 127.9686933 127.9686933 127.9686933
## Peex113Ctg00724_110996 Peex113Ctg00732_42016 Peex113Ctg18704_324352
## 130.3289409 130.3289409 130.3289409
## Peex113Ctg18321_355103 Peex113Ctg18704_324453 Peex113Ctg18704_324482
## 130.3289409 130.3289409 132.0070703
## Peex113Ctg18704_328307 Peex113Ctg17980_234395 Peex113Ctg17980_256247
## 132.0070703 133.6705732 133.6705732
## Peex113Ctg18048_297027 Peex113Ctg18048_486663 Peex113Ctg18048_486691
## 133.6705732 133.6705732 133.6705732
## Peex113Ctg18048_486711 Peex113Ctg18472_1369045 Peex113Ctg00675_49286
## 133.6705732 135.9585454 135.9585454
## Peex113Ctg18241_430210 Peex113Ctg10921_365598 Peex113Ctg17917_182271
## 135.9585454 135.9585454 135.9585454
## Peex113Ctg17917_182284 Peex113Ctg17917_182999 Peex113Ctg18244_266269
## 135.9585454 135.9585454 135.9585454
## Peex113Ctg18474_415683 Peex113Ctg18472_1405321 Peex113Ctg18472_1405386
## 135.9585454 135.9585454 135.9585454
## Peex113Ctg00825_221647 Peex113Ctg10921_452643 Peex113Ctg18241_430417
## 135.9585454 135.9585454 135.9585454
## Peex113Ctg00825_221646 Peex113Ctg14969_121398 Peex113Ctg09482_37421
## 138.2263470 140.0745656 142.1132943
## Peex113Ctg17849_345861 Peex113Ctg17849_590492 Peex113Ctg17851_317860
## 142.1132943 142.1132943 142.1132943
## Peex113Ctg17851_318051 Peex113Ctg17851_488084 Peex113Ctg17851_488144
## 142.1132943 142.1132943 142.1132943
## Peex113Ctg17849_345854 Peex113Ctg17899_673736 Peex113Ctg18308_61078
## 144.1680766 146.0939891 146.0939891
## Peex113Ctg18308_60876 Peex113Ctg18321_531358 Peex113Ctg17795_207334
## 146.0939891 146.0939891 146.0939891
## Peex113Ctg17782_378627 Peex113Ctg18307_162364 Peex113Ctg18307_162308
## 146.0939891 146.0939891 146.0939891
## Peex113Ctg18293_356552 Peex113Ctg02378_57283 Peex113Ctg18077_95330
## 147.9902997 147.9902997 147.9902997
## Peex113Ctg18077_213781 Peex113Ctg18077_644251 Peex113Ctg17801_106287
## 147.9902997 147.9902997 147.9902997
## Peex113Ctg17801_119184 Peex113Ctg17562_47266 Peex113Ctg17796_213674
## 147.9902997 147.9902997 147.9902997
## Peex113Ctg17796_423919 Peex113Ctg18491_144871 Peex113Ctg18175_206905
## 147.9902997 147.9902997 147.9902997
## Peex113Ctg17786_395689 Peex113Ctg18300_112759 Peex113Ctg18524_345734
## 147.9902997 147.9902997 147.9902997
## Peex113Ctg17555_144155 Peex113Ctg17558_384569 Peex113Ctg18290_473054
## 147.9902997 147.9902997 147.9902997
## Peex113Ctg18524_383463 Peex113Ctg18524_345838 Peex113Ctg17782_386057
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18000_209744 Peex113Ctg18177_1021727 Peex113Ctg17990_411539
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17992_306374 Peex113Ctg17543_339206 Peex113Ctg18535_553968
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17994_1126260 Peex113Ctg18128_18953 Peex113Ctg18189_741852
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg07717_90189 Peex113Ctg07717_90377 Peex113Ctg17779_287139
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17786_52263 Peex113Ctg17786_74627 Peex113Ctg18010_293657
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18192_141229 Peex113Ctg18472_798455 Peex113Ctg18663_230243
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18297_45555 Peex113Ctg01095_110011 Peex113Ctg17884_164472
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg05516_90969 Peex113Ctg18297_442075 Peex113Ctg18298_244631
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18300_112743 Peex113Ctg05516_91001 Peex113Ctg05595_156157
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg05596_329998 Peex113Ctg05695_346092 Peex113Ctg07964_27066
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg00799_190624 Peex113Ctg00994_37422 Peex113Ctg00994_37425
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg10947_182884 Peex113Ctg12981_135809 Peex113Ctg01272_54235
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg13854_123235 Peex113Ctg17545_482530 Peex113Ctg17550_642096
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17550_642101 Peex113Ctg17550_642137 Peex113Ctg17664_834156
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17805_232163 Peex113Ctg17861_512516 Peex113Ctg17861_512528
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17861_512562 Peex113Ctg17861_512572 Peex113Ctg17899_673749
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17903_1178462 Peex113Ctg17917_394608 Peex113Ctg17958_102587
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg17978_199737 Peex113Ctg18043_230175 Peex113Ctg18162_549416
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18302_307025 Peex113Ctg18472_105306 Peex113Ctg18472_105334
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18472_105371 Peex113Ctg18491_30670 Peex113Ctg18491_30686
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18492_1047407 Peex113Ctg18498_378930 Peex113Ctg18524_30502
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18524_30534 Peex113Ctg18524_39246 Peex113Ctg18633_48928
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18712_173300 Peex113Ctg18492_1047211 Peex113Ctg18633_49044
## 149.5565888 149.5565888 149.5565888
## Peex113Ctg18128_15571 Peex113Ctg07899_24379 Peex113Ctg07899_24371
## 149.5565888 151.1262275 151.1262275
## Peex113Ctg07899_24395 Peex113Ctg07899_40476 Peex113Ctg17543_100270
## 151.1262275 151.1262275 151.1262275
## Peex113Ctg15500_18609 Peex113Ctg07888_66618 Peex113Ctg07899_24268
## 151.1262275 151.1262275 151.1262275
## Peex113Ctg07899_24321 Peex113Ctg07899_24355 Peex113Ctg07899_24356
## 151.1262275 151.1262275 151.1262275
## Peex113Ctg18661_346499 Peex113Ctg18325_192666 Peex113Ctg05596_218706
## 151.1262275 151.1262275 151.1262275
## Peex113Ctg05596_218721 Peex113Ctg05596_329971 Peex113Ctg17884_167193
## 151.1262275 151.1262275 151.1262275
## Peex113Ctg18684_315575 Peex113Ctg17796_213587 Peex113Ctg17786_371785
## 151.1262275 151.1262275 152.9366904
# ctgnames <- as.vector(names(map1$imputed.geno$L.1$map))
# ctgs <- vector() ; for(i in ctgnames){ctgs[i] <- as.vector(strsplit(i, split = "_")[[1]])[1]} ; names(ctgs) <- NULL
ctgs <- as.data.frame(map1$imputed.geno$L.12$map)
ctgs <- as.data.frame(map1$geno$L.1$map)
ctgs[,"Ctg"] <- substr(rownames(ctgs), start=1, stop=15)
write.csv(ctgs, "ctgs.csv")
Find gene name on the P.exserta annotation file, search CDS in P.exserta MRNA file and blast against newest P.axillaris genome to check which chromosome it is. Table in Excel file. Write down position of marker on chromosome.
Focus on markers in the beginning and end of chromosome, there the data is more reliable.
# from where P.exserta annotation is stored
grep '^Peex113Ctg08628\speex113\sgene' P.EXSERTA.contigs.v1.1.3.annotation.v1.gff
# copy out name and search with vim in file
vim P.EXSERTA.contigs.v1.1.3.annotation.v1.MRNA.fasta # search with '/genename'
# copy out CDS and blast with SequenceServer
lgpax <- read.table("PaxChr.csv", header=T, sep = ",")
ggparallel(list('Linkage.group', 'AX.chromosome.best.match'), lgpax)
Are the same contigs together on a chromosome and on a super-scaffold?
ss <- read.table("OMss.csv", sep=",", header=T)
ss[,"OM"] <- as.numeric(as.factor(ss$Super.Scaffold.OM))
ggparallel(list('Linkage.Group','OM'), ss)
We do not see any overlap, this is great.
Caps markers are markers which have been associated with a chromosome. Align caps to NGS genome and check which Contigs are listed there.
Made database of P.exserta genome for blastn with makeblastdb -in P.EXSERTA.contigs.v1.1.3.fasta -dbtype nucl -parse_seqids. Copy sequence and name of marker into file query.fasta and then blast with blastn -db P.EXSERTA.contigs.v1.1.3.fasta -query query.fasta -out results.out. Grep “Ctg” in the results file with grep Ctg results.out and search the Ctg names in the genetic map linkage groups.
#write.cross(map1, format="qtab") # and only use data_location.qtab for searching Ctg names.
caps <- read.table("overviews/caps.csv", sep="\t", header=F)
caps <- cbind(caps, as.numeric(substring(as.character(caps$V4), first=3)))
names(caps)[6] <- "LGs"
names(caps)[1] <- "Pax_Chr"
ggparallel(list('LGs','Pax_Chr'), caps, sub="chr7 absent")
Caps markers could not help to identify which LG is which chromosome. Either the Axillaris assembly or the genetic map is full of errors.
Further process the file with ALLMAPS